MersV1, Main, Exploration, bibRecord, 000513

Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and k-mer Hashing.

Identifieur interne : 000513 ( Main/Exploration ); précédent : 000512; suivant : 000514

Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and k-mer Hashing.

Auteurs : Enrico Petrucci [Italie] ; Laurent Noé [France] ; Cinzia Pizzi [Italie] ; Matteo Comin [Italie]

Source :

Journal of computational biology : a journal of computational molecular cell biology [ 1557-8666 ] ; 2019.

RBID : pubmed:31800307

Abstract

Alignment-free classification of sequences has enabled high-throughput processing of sequencing data in many bioinformatics pipelines. Much work has been done to speed up the indexing of k-mers through hash-table and other data structures. These efforts have led to very fast indexes, but because they are k-mer based, they often lack sensitivity due to sequencing errors or polymorphisms. Spaced seeds are a special type of pattern that accounts for errors or mutations. They allow to improve the sensitivity and they are now routinely used instead of k-mers in many applications. The major drawback of spaced seeds is that they cannot be efficiently hashed and thus their usage increases substantially the computational time. In this article we address the problem of efficient spaced seed hashing. We propose an iterative algorithm that combines multiple spaced seed hashes by exploiting the similarity of adjacent hash values to efficiently compute the next hash. We report a series of experiments on HTS reads hashing, with several spaced seeds. Our algorithm can compute the hashing values of spaced seeds with a speedup in range of [3.5 × -7 × ], outperforming previous methods. Software and data sets are available at Iterative Spaced Seed Hashing.

DOI: 10.1089/cmb.2019.0298
PubMed: 31800307

Affiliations:

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and <i>k</i>
-mer Hashing.</title>
<author><name sortKey="Petrucci, Enrico" sort="Petrucci, Enrico" uniqKey="Petrucci E" first="Enrico" last="Petrucci">Enrico Petrucci</name>
<affiliation wicri:level="1"><nlm:affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</nlm:affiliation>
<country xml:lang="fr">Italie</country>
<wicri:regionArea>Department of Information Engineering, University of Padova, Padova</wicri:regionArea>
<wicri:noRegion>Padova</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Noe, Laurent" sort="Noe, Laurent" uniqKey="Noe L" first="Laurent" last="Noé">Laurent Noé</name>
<affiliation wicri:level="3"><nlm:affiliation>CRIStAL UMR9189, Universit de Lille, Lille, France.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>CRIStAL UMR9189, Universit de Lille, Lille</wicri:regionArea>
<placeName><region type="region">Hauts-de-France</region>
<region type="old region">Nord-Pas-de-Calais</region>
<settlement type="city">Lille</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Pizzi, Cinzia" sort="Pizzi, Cinzia" uniqKey="Pizzi C" first="Cinzia" last="Pizzi">Cinzia Pizzi</name>
<affiliation wicri:level="1"><nlm:affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</nlm:affiliation>
<country xml:lang="fr">Italie</country>
<wicri:regionArea>Department of Information Engineering, University of Padova, Padova</wicri:regionArea>
<wicri:noRegion>Padova</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Comin, Matteo" sort="Comin, Matteo" uniqKey="Comin M" first="Matteo" last="Comin">Matteo Comin</name>
<affiliation wicri:level="1"><nlm:affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</nlm:affiliation>
<country xml:lang="fr">Italie</country>
<wicri:regionArea>Department of Information Engineering, University of Padova, Padova</wicri:regionArea>
<wicri:noRegion>Padova</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2019">2019</date>
<idno type="RBID">pubmed:31800307</idno>
<idno type="pmid">31800307</idno>
<idno type="doi">10.1089/cmb.2019.0298</idno>
<idno type="wicri:Area/PubMed/Corpus">000343</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000343</idno>
<idno type="wicri:Area/PubMed/Curation">000343</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000343</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000508</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000508</idno>
<idno type="wicri:Area/Ncbi/Merge">002421</idno>
<idno type="wicri:Area/Ncbi/Curation">002421</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">002421</idno>
<idno type="wicri:Area/Main/Merge">000516</idno>
<idno type="wicri:Area/Main/Curation">000513</idno>
<idno type="wicri:Area/Main/Exploration">000513</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and <i>k</i>
-mer Hashing.</title>
<author><name sortKey="Petrucci, Enrico" sort="Petrucci, Enrico" uniqKey="Petrucci E" first="Enrico" last="Petrucci">Enrico Petrucci</name>
<affiliation wicri:level="1"><nlm:affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</nlm:affiliation>
<country xml:lang="fr">Italie</country>
<wicri:regionArea>Department of Information Engineering, University of Padova, Padova</wicri:regionArea>
<wicri:noRegion>Padova</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Noe, Laurent" sort="Noe, Laurent" uniqKey="Noe L" first="Laurent" last="Noé">Laurent Noé</name>
<affiliation wicri:level="3"><nlm:affiliation>CRIStAL UMR9189, Universit de Lille, Lille, France.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>CRIStAL UMR9189, Universit de Lille, Lille</wicri:regionArea>
<placeName><region type="region">Hauts-de-France</region>
<region type="old region">Nord-Pas-de-Calais</region>
<settlement type="city">Lille</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Pizzi, Cinzia" sort="Pizzi, Cinzia" uniqKey="Pizzi C" first="Cinzia" last="Pizzi">Cinzia Pizzi</name>
<affiliation wicri:level="1"><nlm:affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</nlm:affiliation>
<country xml:lang="fr">Italie</country>
<wicri:regionArea>Department of Information Engineering, University of Padova, Padova</wicri:regionArea>
<wicri:noRegion>Padova</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Comin, Matteo" sort="Comin, Matteo" uniqKey="Comin M" first="Matteo" last="Comin">Matteo Comin</name>
<affiliation wicri:level="1"><nlm:affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</nlm:affiliation>
<country xml:lang="fr">Italie</country>
<wicri:regionArea>Department of Information Engineering, University of Padova, Padova</wicri:regionArea>
<wicri:noRegion>Padova</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j">Journal of computational biology : a journal of computational molecular cell biology</title>
<idno type="eISSN">1557-8666</idno>
<imprint><date when="2019" type="published">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><b>Alignment-free classification of sequences has enabled high-throughput processing of sequencing data in many bioinformatics pipelines. Much work has been done to speed up the indexing of <i>k</i>
-mers through hash-table and other data structures. These efforts have led to very fast indexes, but because they are <i>k</i>
-mer based, they often lack sensitivity due to sequencing errors or polymorphisms. Spaced seeds are a special type of pattern that accounts for errors or mutations. They allow to improve the sensitivity and they are now routinely used instead of <i>k</i>
-mers in many applications. The major drawback of spaced seeds is that they cannot be efficiently hashed and thus their usage increases substantially the computational time. In this article we address the problem of efficient spaced seed hashing. We propose an iterative algorithm that combines multiple spaced seed hashes by exploiting the similarity of adjacent hash values to efficiently compute the next hash. We report a series of experiments on HTS reads hashing, with several spaced seeds. Our algorithm can compute the hashing values of spaced seeds with a speedup in range of [3.5 × -7 × ], outperforming previous methods. Software and data sets are available at Iterative Spaced Seed Hashing.</b>
</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
<li>Italie</li>
</country>
<region><li>Hauts-de-France</li>
<li>Nord-Pas-de-Calais</li>
</region>
<settlement><li>Lille</li>
</settlement>
</list>
<tree><country name="Italie"><noRegion><name sortKey="Petrucci, Enrico" sort="Petrucci, Enrico" uniqKey="Petrucci E" first="Enrico" last="Petrucci">Enrico Petrucci</name>
</noRegion>
<name sortKey="Comin, Matteo" sort="Comin, Matteo" uniqKey="Comin M" first="Matteo" last="Comin">Matteo Comin</name>
<name sortKey="Pizzi, Cinzia" sort="Pizzi, Cinzia" uniqKey="Pizzi C" first="Cinzia" last="Pizzi">Cinzia Pizzi</name>
</country>
<country name="France"><region name="Hauts-de-France"><name sortKey="Noe, Laurent" sort="Noe, Laurent" uniqKey="Noe L" first="Laurent" last="Noé">Laurent Noé</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000513 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000513 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:31800307
   |texte=   Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and k-mer Hashing.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:31800307" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

Serveur d'exploration MERS

Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and k-mer Hashing.

Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and k-mer Hashing.

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.